Method and Apparatus for Converting Globally Clock-Gated Circuits to Locally Clock-Gated Circuits

ABSTRACT

A method for converting globally clock-gated circuits to locally clock-gated circuits is disclosed. A timing analysis is initially performed on an integrated circuit (IC) design to generate a slack time report for all globally clock-gated circuits within the IC design. Based on their respective slack time indicated in the slack time report, all globally clock-gated circuits that should be connected to locally generated clocks are identified. After disconnecting from a global clock tree, each of the identified globally clock-gated circuits is subsequently connected to a locally generated clock having a clock delay comparable to its slack time indicated in the slack time report.

RELATED PATENT APPLICATION

The present application is a divisional of U.S. Patent application Ser.No. 10/904,397 (Atty. Docket No. BUR920040011US1), filed on Nov. 8,2004, and entitled, “Method and Apparatus for Converting GloballyClock-Gated Circuits to Locally Clock-Gated Circuits,” which isincorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates to integrated circuit design methods ingeneral, and, in particular, to a method for assigning clock-gatedcircuits within an integrated circuit design. Still more particularly,the present invention relates to a method for converting globallyclock-gated circuits to locally clock-gated circuits within anintegrated circuit design.

2. Description of Related Art

A digital integrated circuit (IC) design typically employs manyclock-gated circuits, such as flip-flops, latches, etc., that areperiodically clocked by edges of a clock signal. Since there is a verylarge number (thousands or millions) of clock-gated circuits within anIC design, a single clock signal driver normally cannot directly supplya clock signal to all of the clock-gated circuits. Instead, a globalclock tree having a set of buffers arranged in a tree-like network isutilized to supply clock signals to various clock-gated circuits. Allcircuits clocked by a global clock tree are considered as globallyclock-gated circuits.

In order to ensure proper synchronization between various parts of acircuit design, each clock signal edge should reach all synchronizationpoints at substantially the same time. Thus, the time required for aclock signal edge to travel from its source to any clock-gated circuitshould be substantially the same for all paths it follows through theglobal clock tree. The time required for a clock signal edge to work itsway through the global clock tree from its source to a globallyclock-gated circuit depends on many factors, such as the lengths ofconductors in the path, the number of buffers the edge must passthrough, the switching delay of each buffer, the amount of attenuationof the clock signal incurs between buffer stages, and the load eachbuffer must drive. Accordingly, the global clock tree needs to bebalanced by ensuring that all clock signal paths between any two treelevels are of substantially similar length and impedance, that allbuffers at any level of the global clock tree drive the same number ofbuffers or globally clock-gated circuits at the next level of the globalclock tree, and that all buffers on any given level have similarcharacteristics.

Generally speaking, global clock trees consume a relatively large amountof power. Global clock trees typically attribute to approximately 30-60%of the total power consumption of an IC design. In addition, theclocking of a global clock tree requires a rigid boundary betweenpipeline stages such that all logic must line up upon the boundaries.Thus, the ability to improve performance either in the current pipelinestage or in the next pipeline stage becomes locked to the clockboundary. The present disclosure provides a method for reducing overallclocking power consumption of an IC design such that additionalflexibility in clock management can be achieved.

SUMMARY OF THE INVENTION

In accordance with a preferred embodiment of the present invention, atiming analysis is initially performed on an integrated circuit (IC)design to generate a slack time report for all globally clock-gatedcircuits within the IC design. Based on their respective slack timeindicated in the slack time report, all globally clock-gated circuitsthat should be connected to locally generated clocks are identified.After disconnecting from a global clock tree, each of the identifiedglobally clock-gated circuits is subsequently connected to a locallygenerated clock having a clock delay comparable to its slack timeindicated in the slack time report.

All features and advantages of the present invention will becomeapparent in the following detailed written description.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention itself, as well as a preferred mode of use, furtherobjects, and advantages thereof, will best be understood by reference tothe following detailed description of an illustrative embodiment whenread in conjunction with the accompanying drawings, wherein:

FIG. 1 is a block diagram of a conventional global clock tree forproviding a common clock signal input to globally clock-gated circuitswithin an integrated circuit;

FIG. 2 is a high-level logic flow diagram of a method for convertingglobally clock-gated circuits to locally clock-gated circuits, inaccordance with a preferred embodiment of the present invention;

FIG. 3 is a block diagram of a locally generated clock connected to twolocally clock-gated circuits, in accordance with a preferred embodimentof the present invention;

FIG. 4 is a high-level logic flow diagram of a method for determiningwhether or not a globally clock-gated circuit should be converted to alocally clock-gated circuit, in accordance with a preferred embodimentof the present invention; and

FIG. 5 is a block diagram of a computer system in which a preferredembodiment of the present invention is incorporated.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

Referring now to the drawings and in particular to FIG. 1, there isdepicted a block diagram of a conventional global clock tree forproviding a common clock signal input to clock-gated circuits, such asflip-flops or latches, within an integrated circuit (IC). As shown, aglobal clock tree 10 includes an array of buffers 12-13 to fan out aCLOCK signal generated from a clock signal source 11. Typically, globalclock tree 10 is locked tightly to a specific frequency with virtuallyzero jitter and clock drift across an entire IC design. In theembodiment shown in FIG. 1, two first stage buffers 12 fan the CLOCKsignal out to four second stage buffers 13 that, in turn, fan the CLOCKsignal out to thirty-two sinks 14. The number of buffer stages, thenumber of buffers per stage and the number of buffers or sinks eachbuffer drives are matters of design choice that depend on factors suchas load capacity of buffers forming global clock tree 10, inputimpedance of the devices being driven, path impedances and allowablesignal attenuation between stages, etc.

Many circuits in the digital portion of an IC design change their logicstates very infrequently but continue to be clocked in a synchronousfashion by a high-power clock tree, such as global clock tree 10 in FIG.1, on every clock cycle. Such practice adds to unnecessary powerconsumption in clock distributions and latch activities. The presentinvention allows some globally clock-gated circuits within an IC designthat switch infrequently to be converted to locally clock-gated circuits(i.e., using a locally generated delay clock). By reducing the number ofsimultaneous circuit switching within an IC design on the high-powerclock tree or global clock tree, power consumption and chip noise canboth be reduced.

Although the localized delay clock still consumes power, an overallpower reduction can be achieved if the new clock topology (i.e., onewith a smaller global clock tree and the locally generated clockcircuits) demands less power than the original unmodified global clocktree. Another advantage of reducing the number of globally clock-gatedcircuits locked to a global clock tree is that the launch noise of theset of globally clock-gated circuits driven on the global clock tree canalso be reduced. Basically, the amount of simultaneous noise is reducedvia a frequency spectrum spreading, which is an effect of usinglocalized delay clocking.

With reference now to FIG. 2, there is illustrated a high-level logicflow diagram of a method for converting globally clock-gated (orsynchronous) circuits to locally clock-gated circuits, in accordancewith a preferred embodiment of the present invention. Starting at block21, a synchronous IC design having multiple globally clock-gatedcircuits, such as latches, flip-flops, etc., is simulated usingfunctional test vectors that are deemed to cover a wide range of normaloperating conditions. If no functional test vectors are available, thesynchronous IC design may be simulated using automatic test patterngeneration (ATPG) vectors. In either case, a logic circuit is formedwith simulation results for the IC design in question. A timing analysisis then performed on the synchronous IC design, as shown in block 22.

Based on the result of the timing analysis, each globally clock-gatedcircuit is categorized in a respective group according to its slacktime, as depicted in block 23. For the purpose of the present invention,slack time is defined to include the amount of time margin for aglobally clock-gated circuit to receive an input signal, and the amountof time margin for the globally clock-gated circuit to deliver an outputsignal to another circuit. Each globally clock-gated circuit can begenerally placed under a positive slack time group or a negative slacktime group according to the timing analysis. Globally clock-gatedcircuits with a positive slack time are defined as globally clock-gatedcircuits that are able to complete their switch operation before theirallocated time under the IC design specification. Each globallyclock-gated circuit in the positive slack time group is then furthercategorized according to a specific range of slack time under which theglobally clock-gated circuit falls.

For the globally clock-gated circuits with a positive slack time, aprocess is performed to identify all the globally clock-gated circuitsthat can be connected a locally generated clock, as shown in block 24.Such process will be further explained in details in FIG. 3.

A locally generated clock is generated for each slack time range, asdepicted in block 25. For example, a slack time of 1 ns to 10 ns can bedivided into three ranges, with range 1 for slack time from 1 to lessthan 4 ns, range 2 for slack time from 4 to less than 7 ns, and range 3for slack time from 7 to less than 10 ns (the above-mentioned slacktimes include both input and output timing margins). In order toaccommodate the three slack time ranges, three locally generated clocksare then generated, with the first one designed for slack time range 1,the second one designed for slack time range 2 and the third onedesigned for slack time range 3.

Each globally clock-gated circuit that has been identified forconnecting to a locally generated clock is then disconnected from aglobal clock tree and be connected to a locally generated clock for thespecific range of slack time under which the globally clock-gatedcircuit falls, as shown in block 26. For example, if a globallyclock-gated circuit has been identified (from block 24) for connectingto a locally generated clock, and the globally clock-gated circuit hasbeen determined (from block 22) to have a slack time of 5 ns, theglobally clock-gated circuit is then disconnected from a global clocktree and be connected to a locally generated clock designed for slacktime from 4 to less than 7 ns. In some instances, manual adjustments tothe circuit delays associated with locally generated delay clocks may berequired.

After the completion of the synthesis, placement and wiring, etc., atiming analysis is performed on the entire IC design again, as shown inblock 27. The performance of timing analysis is to ensure that, afterthe above-mentioned clock modification, the entire IC design functionsas intended and the timing specification of the entire IC design issatisfied.

A determination is made as to whether or not the IC design meets thetiming requirement, as shown in block 28. If the IC design does not meetthe timing requirement, the process returns to block 23 for a differentslack time grouping. Otherwise, if the IC design meets the timingrequirement, the process is complete.

Referring now to FIG. 3, there is depicted a block diagram of a locallygenerated clock connected to two locally clock-gated circuits, inaccordance with a preferred embodiment of the present invention. Asshown, a local clock generator 31 is connected to locally clock-gatedcircuits 32 and 33 (both clock-gated circuits 32 and 33 were formerlyglobally clock-gated circuits connected to a global clock tree) via twodifferent groups of delay elements. For example, locally clock-gatedcircuit 32 receives clock signals from local clock generator 31 via twodelay elements, and locally clock-gated circuit 33 receives clocksignals from local clock generator 31 via three delay elements.

In the generation of delayed clocks that are routed within an IC design,each delayed clock must fall within the required timing specification toguarantee the slack time for the entire process range of the technology.If the delay chain is generated in an open ended fashion where a sourceclock (from a local clock generator) is injected at the beginning of thedelay chain and delayed clocks are tapped off from the delay chain, eachstage of the delay chain is more susceptible to process, voltage, andtemperature variation than the previous stage because each tapped delayis additive. To provide low jitter for each tap of the tapped delayline, the delay line may be closed with feedback in a ring fashion and amaster source clock may be used as a reference comparison to the delaychain input. The master source clock and feedback input to the firststage of the delay chain can be compared to align with one another. Ifthe two clocks do not align, tail currents can be added or subtractedequally to each stage of the delay chain until the two clocks align.Such calibration procedure allows for multiple delay chains to becalibrated to a single master source clock and provide a solution whereeach delayed clock phase used on the IC design has comparable jitter.

In order to determine whether or not a globally clock-gated circuitshould be converted to a locally clock-gated circuit, four inputs arepreferably utilized, and they are: a logic circuit netlist, a switchingfactor connected to the clocked-gated circuit, a switching factorthreshold, and don't touch markers.

The “switching factor” for a data input to a globally clock-gatedcircuit is generated by two values from the simulation results : (1) atotal number of clock-signal switches present at the globallyclock-gated circuit, and (2) a total number of data input switchespresent at the same globally clock-gated circuit. The switching factoris determined by the ratio of data input switches to clock-signalswitches within the same time interval.

A user-specified “switching factor threshold” may be used to indicatewhich globally clock-gated circuits should be converted to correspondinglocally clock-gated circuits. Specifically, clock-gated circuits whosedata-input switching factors exceed the switching factor threshold aretargeted for conversion. The switching factor threshold may be selectedby a user to be any value between 0 and 1 although, for example, it maynot be recommended to use a switching factor greater than 0.5.

A circuit designer may desire to override the conversion process for anyglobally clock-gated circuit within an IC design. A don't touch markercan be applied to any globally clock-gated circuit within an IC designthat is intended to remain connected to a global clock tree (instead ofbeing connected to a localized delay clock).

With reference now to FIG. 4, there is illustrated a high-level logicflow diagram of a method for determining whether or not a globallyclock-gated circuit should be converted to a locally clock-gatedcircuit, in accordance with a preferred embodiment of the presentinvention. Starting at block 41, a determination is made as to whetheror not a globally clock-gated circuit is a “don't touch” circuit (i.e.,whether or not a “don't touch” marker has been applied), as shown inblock 42. If the globally clock-gated circuit is not a “don't touch”circuit, then a determination is made as to whether or not a switchingfactor of the globally clock-gated circuit is greater than apredetermined switching factor threshold, as shown in block 43. Eachglobally clock-gated circuit in the IC design is considered by theprocess of “don't touch.” Any globally clock-gated circuit marked “don'ttouch” is left unchanged.

If the switching factor of the globally clock-gated circuit is greaterthan the predetermined switching factor threshold, then the globallyclock-gated circuit is converted to a corresponding locally clock-gatedcircuit, as shown in block 44. The globally clock-gated circuit can beconverted to a corresponding locally clock-gated circuit bydisconnecting the globally clock-gated circuit from a global clock treeand connecting the globally clock-gated circuit to a locally generateddelay clock. Otherwise, if the switching factor of the globallyclock-gated circuit is not greater than the predetermined switchingfactor threshold, the process proceeds to block 45.

As depicted in block 45, a determination is made as to whether or notthere is any other globally clock-gated circuit left to be processed. Ifthere is a globally clock-gated circuit left to be processed, theprocess returns to block 42. Otherwise, if there is no globallyclock-gated circuit left to be processed, the process is completed, asshown in block 46.

As has been described, the present invention provides a method andapparatus for converting globally clock-gated circuits to locallyclock-gated circuits. In essence, all globally clock-gated circuits witha switching factor greater than a switching factor threshold areconverted to corresponding locally clock-gated circuits, and globallyclock-gated circuits with a switching factor less than (equal to) theswitching factor threshold are left unchanged. Once all the globallyclock-gated circuits that were targeted for conversion have beenconverted, simulation is again performed on the entire IC design, with afocus on the locally clock-gated circuit cuts.

By allowing the set of clocks to be generated based upon actual layoutand timing reports, the noise spectrum can be spread in such a way as tominimize the overall effect on more timing critical paths, and to reducenoise generated coupling and due to simultaneous switching. In addition,by maximizing the number of local clocks versus the total number ofglobal synchronously generated clocks, the overall power consumption canbe reduced.

Referring now to FIG. 5, there is depicted a block diagram of a computersystem in which a preferred embodiment of the present invention isincorporated. As shown, a computer system 50 includes a processor 51, asystem memory 52 and a hard drive 55. Processor 51 executes instructionsand data that are stored in system memory 52. In addition, computersystem 50 also includes input devices 53, such as a keyboard and amouse, output devices 54, such as a display monitor and a printer.

Although the present invention has been described in the context of afully functional computer system, those skilled in the art willappreciate that the mechanisms of the present invention are capable ofbeing distributed as a program product in a variety of forms, and thatthe present invention applies equally regardless of the particular typeof signal bearing media utilized to actually carry out the distribution.Examples of signal bearing media include, without limitation, recordabletype media such as floppy disks or CD ROMs and transmission type mediasuch as analog or digital communications links.

While the invention has been particularly shown and described withreference to a preferred embodiment, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention.

1. A method for converting globally clock-gated circuits to locallyclock-gated circuits within in integrated circuit design, said methodcomprising: identifying globally clock-gated circuits that change stateinfrequently; and converting said identified globally clock-gatedcircuits to corresponding locally clock-gated circuits.
 2. The method ofclaim 1, wherein said method further includes assigning said identifiedglobally clock-gated circuits to a plurality of group according to theirslack time indicated in a slack time report.
 3. The method of claim 2,wherein said method further includes providing said identified globallyclock-gated circuit with respective locally generated clocks having aclock delay comparable to their slack time indicated in said slack timereport.
 4. The method of claim 1, wherein said method further includesperforming a timing analysis on said integrated circuit design aftersaid identified globally clock-gated circuits has been connected to saidrespective locally generated clocks.
 5. The method of claim 1, whereinsaid method further includes determining whether or not a globallyclock-gated circuit should be converted to a locally clock-gatedcircuit.
 6. The method of claim 5, wherein said determining furtherincludes utilizing a logic circuit netlist, a switching factor, and aswitching factor threshold to determine whether or not a globallyclock-gated circuit should be converted to a locally clock-gatedcircuit.